Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 96
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38426904

RESUMO

BACKGROUND: Somatic mutational signatures elucidate molecular vulnerabilities to therapy and therefore detecting signatures and classifying tumors with respect to signatures has clinical value. However, identifying the etiology of the mutational signatures remains a statistical challenge, with both small sample sizes and high variability in classification algorithms posing barriers. As a result, few signatures have been strongly linked to particular risk factors. METHODS: Here we develop a statistical model, Diffsig, for estimating the association of one or more continuous or categorical risk factors with DNA mutational signatures. Diffsig takes into account the uncertainty associated with assigning signatures to samples as well as multiple risk factors' simultaneous effect on observed DNA mutations. RESULTS: We applied Diffsig to breast cancer data to assess relationships between five established breast-relevant mutational signatures and etiologic variables, confirming known mechanisms of cancer development. In simulation, our model was capable of accurately estimating expected associations in a variety of contexts. CONCLUSIONS: Diffsig allows researchers to quantify and perform inference on the associations of risk factors with mutational signatures. IMPACT: We expect Diffsig to provide more robust associations of risk factors with signatures to lead to better understanding of the tumor development process and improved models of tumorigenesis.

2.
bioRxiv ; 2023 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-37961277

RESUMO

Complete characterization of the genetic effects on gene expression is needed to elucidate tissue biology and the etiology of complex traits. Here, we analyzed 2,344 subcutaneous adipose tissue samples and identified 34K conditionally distinct expression quantitative trait locus (eQTL) signals in 18K genes. Over half of eQTL genes exhibited at least two eQTL signals. Compared to primary signals, non-primary signals had lower effect sizes, lower minor allele frequencies, and less promoter enrichment; they corresponded to genes with higher heritability and higher tolerance for loss of function. Colocalization of eQTL with conditionally distinct genome-wide association study signals for 28 cardiometabolic traits identified 3,605 eQTL signals for 1,861 genes. Inclusion of non-primary eQTL signals increased colocalized signals by 46%. Among 30 genes with ≥2 pairs of colocalized signals, 21 showed a mediating gene dosage effect on the trait. Thus, expanded eQTL identification reveals more mechanisms underlying complex traits and improves understanding of the complexity of gene expression regulation.

3.
Cell Genom ; 3(10): 100404, 2023 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-37868037

RESUMO

Genome-wide association studies (GWASs) have successfully identified 145 genomic regions that contribute to schizophrenia risk, but linkage disequilibrium makes it challenging to discern causal variants. We performed a massively parallel reporter assay (MPRA) on 5,173 fine-mapped schizophrenia GWAS variants in primary human neural progenitors and identified 439 variants with allelic regulatory effects (MPRA-positive variants). Transcription factor binding had modest predictive power, while fine-map posterior probability, enhancer overlap, and evolutionary conservation failed to predict MPRA-positive variants. Furthermore, 64% of MPRA-positive variants did not exhibit expressive quantitative trait loci signature, suggesting that MPRA could identify yet unexplored variants with regulatory potentials. To predict the combinatorial effect of MPRA-positive variants on gene regulation, we propose an accessibility-by-contact model that combines MPRA-measured allelic activity with neuronal chromatin architecture.

4.
bioRxiv ; 2023 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-37693528

RESUMO

The function of some genetic variants associated with brain-relevant traits has been explained through colocalization with expression quantitative trait loci (eQTL) conducted in bulk post-mortem adult brain tissue. However, many brain-trait associated loci have unknown cellular or molecular function. These genetic variants may exert context-specific function on different molecular phenotypes including post-transcriptional changes. Here, we identified genetic regulation of RNA-editing and alternative polyadenylation (APA), within a cell-type-specific population of human neural progenitors and neurons. More RNA-editing and isoforms utilizing longer polyadenylation sequences were observed in neurons, likely due to higher expression of genes encoding the proteins mediating these post-transcriptional events. We also detected hundreds of cell-type-specific editing quantitative trait loci (edQTLs) and alternative polyadenylation QTLs (apaQTLs). We found colocalizations of a neuron edQTL in CCDC88A with educational attainment and a progenitor apaQTL in EP300 with schizophrenia, suggesting genetically mediated post-transcriptional regulation during brain development lead to differences in brain function.

5.
medRxiv ; 2023 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-37732177

RESUMO

CRISPR base editing screens are powerful tools for studying disease-associated variants at scale. However, the efficiency and precision of base editing perturbations vary, confounding the assessment of variant-induced phenotypic effects. Here, we provide an integrated pipeline that improves the estimation of variant impact in base editing screens. We perform high-throughput ABE8e-SpRY base editing screens with an integrated reporter construct to measure the editing efficiency and outcomes of each gRNA alongside their phenotypic consequences. We introduce BEAN, a Bayesian network that accounts for per-guide editing outcomes and target site chromatin accessibility to estimate variant impacts. We show this pipeline attains superior performance compared to existing tools in variant classification and effect size quantification. We use BEAN to pinpoint common variants that alter LDL uptake, implicating novel genes. Additionally, through saturation base editing of LDLR, we enable accurate quantitative prediction of the effects of missense variants on LDL-C levels, which aligns with measurements in UK Biobank individuals, and identify structural mechanisms underlying variant pathogenicity. This work provides a widely applicable approach to improve the power of base editor screens for disease-associated variant characterization.

6.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37738402

RESUMO

Understanding the function of the human microbiome is important but the development of statistical methods specifically for the microbial gene expression (i.e. metatranscriptomics) is in its infancy. Many currently employed differential expression analysis methods have been designed for different data types and have not been evaluated in metatranscriptomics settings. To address this gap, we undertook a comprehensive evaluation and benchmarking of 10 differential analysis methods for metatranscriptomics data. We used a combination of real and simulated data to evaluate performance (i.e. type I error, false discovery rate and sensitivity) of the following methods: log-normal (LN), logistic-beta (LB), MAST, DESeq2, metagenomeSeq, ANCOM-BC, LEfSe, ALDEx2, Kruskal-Wallis and two-part Kruskal-Wallis. The simulation was informed by supragingival biofilm microbiome data from 300 preschool-age children enrolled in a study of childhood dental disease (early childhood caries, ECC), whereas validations were sought in two additional datasets from the ECC study and an inflammatory bowel disease study. The LB test showed the highest sensitivity in both small and large samples and reasonably controlled type I error. Contrarily, MAST was hampered by inflated type I error. Upon application of the LN and LB tests in the ECC study, we found that genes C8PHV7 and C8PEV7, harbored by the lactate-producing Campylobacter gracilis, had the strongest association with childhood dental disease. This comprehensive model evaluation offers practical guidance for selection of appropriate methods for rigorous analyses of differential expression in metatranscriptomics. Selection of an optimal method increases the possibility of detecting true signals while minimizing the chance of claiming false ones.


Assuntos
Benchmarking , Doenças Estomatognáticas , Criança , Humanos , Pré-Escolar , Biofilmes , Simulação por Computador , Ácido Láctico
7.
Genome Res ; 33(8): 1258-1268, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37699658

RESUMO

Three-dimensional (3D) chromatin structure has been shown to play a role in regulating gene transcription during biological transitions. Although our understanding of loop formation and maintenance is rapidly improving, much less is known about the mechanisms driving changes in looping and the impact of differential looping on gene transcription. One limitation has been a lack of well-powered differential looping data sets. To address this, we conducted a deeply sequenced Hi-C time course of megakaryocyte development comprising four biological replicates and 6 billion reads per time point. Statistical analysis revealed 1503 differential loops. Gained loop anchors were enriched for AP-1 occupancy and were characterized by large increases in histone H3K27ac (over 11-fold) but relatively small increases in CTCF and RAD21 binding (1.26- and 1.23-fold, respectively). Linear modeling revealed that changes in histone H3K27ac, chromatin accessibility, and JUN binding were better correlated with changes in looping than RAD21 and almost as well correlated as CTCF. Changes to epigenetic features between-rather than at-boundaries were highly predictive of changes in looping. Together these data suggest that although CTCF and RAD21 may be the core machinery dictating where loops form, other features (both at the anchors and within the loop boundaries) may play a larger role than previously anticipated in determining the relative loop strength across cell types and conditions.


Assuntos
Cromatina , Histonas , Histonas/metabolismo , Fator de Ligação a CCCTC/genética , Fator de Ligação a CCCTC/metabolismo , Cromatina/genética , Cromossomos/metabolismo , Diferenciação Celular/genética
8.
bioRxiv ; 2023 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-37546772

RESUMO

Background: Reproducibility of human cortical organoid (hCO) phenotypes remains a concern for modeling neurodevelopmental disorders. While guided hCO protocols reproducibly generate cortical cell types in multiple cell lines at one site, variability across sites using a harmonized protocol has not yet been evaluated. We present an hCO cross-site reproducibility study examining multiple phenotypes. Methods: Three independent research groups generated hCOs from one induced pluripotent stem cell (iPSC) line using a harmonized miniaturized spinning bioreactor protocol. scRNA-seq, 3D fluorescent imaging, phase contrast imaging, qPCR, and flow cytometry were used to characterize the 3 month differentiations across sites. Results: In all sites, hCOs were mostly cortical progenitor and neuronal cell types in reproducible proportions with moderate to high fidelity to the in vivo brain that were consistently organized in cortical wall-like buds. Cross-site differences were detected in hCO size and morphology. Differential gene expression showed differences in metabolism and cellular stress across sites. Although iPSC culture conditions were consistent and iPSCs remained undifferentiated, primed stem cell marker expression prior to differentiation correlated with cell type proportions in hCOs. Conclusions: We identified hCO phenotypes that are reproducible across sites using a harmonized differentiation protocol. Previously described limitations of hCO models were also reproduced including off-target differentiations, necrotic cores, and cellular stress. Improving our understanding of how stem cell states influence early hCO cell types may increase reliability of hCO differentiations. Cross-site reproducibility of hCO cell type proportions and organization lays the foundation for future collaborative prospective meta-analytic studies modeling neurodevelopmental disorders in hCOs.

9.
Diabetes ; 72(11): 1707-1718, 2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37647564

RESUMO

Understanding differences in adipose gene expression between individuals with different levels of clinical traits may reveal the genes and mechanisms leading to cardiometabolic diseases. However, adipose is a heterogeneous tissue. To account for cell-type heterogeneity, we estimated cell-type proportions in 859 subcutaneous adipose tissue samples with bulk RNA sequencing (RNA-seq) using a reference single-nuclear RNA-seq data set. Cell-type proportions were associated with cardiometabolic traits; for example, higher macrophage and adipocyte proportions were associated with higher and lower BMI, respectively. We evaluated cell-type proportions and BMI as covariates in tests of association between >25,000 gene expression levels and 22 cardiometabolic traits. For >95% of genes, the optimal, or best-fit, models included BMI as a covariate, and for 79% of associations, the optimal models also included cell type. After adjusting for the optimal covariates, we identified 2,664 significant associations (P ≤ 2e-6) for 1,252 genes and 14 traits. Among genes proposed to affect cardiometabolic traits based on colocalized genome-wide association study and adipose expression quantitative trait locus signals, 25 showed a corresponding association between trait and gene expression levels. Overall, these results suggest the importance of modeling cell-type proportion when identifying gene expression associations with cardiometabolic traits.


Assuntos
Doenças Cardiovasculares , Estudo de Associação Genômica Ampla , Humanos , Índice de Massa Corporal , Obesidade/genética , Expressão Gênica , Doenças Cardiovasculares/genética
10.
Genome Biol ; 24(1): 165, 2023 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-37438847

RESUMO

Detecting allelic imbalance at the isoform level requires accounting for inferential uncertainty, caused by multi-mapping of RNA-seq reads. Our proposed method, SEESAW, uses Salmon and Swish to offer analysis at various levels of resolution, including gene, isoform, and aggregating isoforms to groups by transcription start site. The aggregation strategies strengthen the signal for transcripts with high uncertainty. The SEESAW suite of methods is shown to have higher power than other allelic imbalance methods when there is isoform-level allelic imbalance. We also introduce a new test for detecting imbalance that varies across a covariate, such as time.


Assuntos
Desequilíbrio Alélico , Incerteza , Isoformas de Proteínas/genética , RNA-Seq , Sítio de Iniciação de Transcrição
11.
iScience ; 26(6): 106961, 2023 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-37378336

RESUMO

A certain degree of uncertainty is always associated with the transcript abundance estimates. The uncertainty may make many downstream analyses, such as differential testing, difficult for certain transcripts. Conversely, gene-level analysis, though less ambiguous, is often too coarse-grained. We introduce TreeTerminus, a data-driven approach for grouping transcripts into a tree structure where leaves represent individual transcripts and internal nodes represent an aggregation of a transcript set. TreeTerminus constructs trees such that, on average, the inferential uncertainty decreases as we ascend the tree topology. The tree provides the flexibility to analyze data at nodes that are at different levels of resolution in the tree and can be tuned depending on the analysis of interest. We evaluated TreeTerminus on two simulated and two experimental datasets and observed an improved performance compared to transcripts (leaves) and other methods under several different metrics.

12.
Nat Methods ; 20(8): 1187-1195, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37308696

RESUMO

Most approaches to transcript quantification rely on fixed reference annotations; however, the transcriptome is dynamic and depending on the context, such static annotations contain inactive isoforms for some genes, whereas they are incomplete for others. Here we present Bambu, a method that performs machine-learning-based transcript discovery to enable quantification specific to the context of interest using long-read RNA-sequencing. To identify novel transcripts, Bambu estimates the novel discovery rate, which replaces arbitrary per-sample thresholds with a single, interpretable, precision-calibrated parameter. Bambu retains the full-length and unique read counts, enabling accurate quantification in presence of inactive isoforms. Compared to existing methods for transcript discovery, Bambu achieves greater precision without sacrificing sensitivity. We show that context-aware annotations improve quantification for both novel and known transcripts. We apply Bambu to quantify isoforms from repetitive HERVH-LTR7 retrotransposons in human embryonic stem cells, demonstrating the ability for context-specific transcript expression analysis.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Humanos , RNA-Seq , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Isoformas de Proteínas/genética
13.
PLoS Genet ; 19(5): e1010517, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37216410

RESUMO

Integrative approaches that simultaneously model multi-omics data have gained increasing popularity because they provide holistic system biology views of multiple or all components in a biological system of interest. Canonical correlation analysis (CCA) is a correlation-based integrative method designed to extract latent features shared between multiple assays by finding the linear combinations of features-referred to as canonical variables (CVs)-within each assay that achieve maximal across-assay correlation. Although widely acknowledged as a powerful approach for multi-omics data, CCA has not been systematically applied to multi-omics data in large cohort studies, which has only recently become available. Here, we adapted sparse multiple CCA (SMCCA), a widely-used derivative of CCA, to proteomics and methylomics data from the Multi-Ethnic Study of Atherosclerosis (MESA) and Jackson Heart Study (JHS). To tackle challenges encountered when applying SMCCA to MESA and JHS, our adaptations include the incorporation of the Gram-Schmidt (GS) algorithm with SMCCA to improve orthogonality among CVs, and the development of Sparse Supervised Multiple CCA (SSMCCA) to allow supervised integration analysis for more than two assays. Effective application of SMCCA to the two real datasets reveals important findings. Applying our SMCCA-GS to MESA and JHS, we identified strong associations between blood cell counts and protein abundance, suggesting that adjustment of blood cell composition should be considered in protein-based association studies. Importantly, CVs obtained from two independent cohorts also demonstrate transferability across the cohorts. For example, proteomic CVs learned from JHS, when transferred to MESA, explain similar amounts of blood cell count phenotypic variance in MESA, explaining 39.0% ~ 50.0% variation in JHS and 38.9% ~ 49.1% in MESA. Similar transferability was observed for other omics-CV-trait pairs. This suggests that biologically meaningful and cohort-agnostic variation is captured by CVs. We anticipate that applying our SMCCA-GS and SSMCCA on various cohorts would help identify cohort-agnostic biologically meaningful relationships between multi-omics data and phenotypic traits.


Assuntos
Análise de Correlação Canônica , Proteômica , Humanos , Proteômica/métodos , Multiômica , Estudos de Coortes
14.
Genome Biol ; 24(1): 130, 2023 05 30.
Artigo em Inglês | MEDLINE | ID: mdl-37254169

RESUMO

BACKGROUND: Genetic variation influences both chromatin accessibility, assessed in chromatin accessibility quantitative trait loci (caQTL) studies, and gene expression, assessed in expression QTL (eQTL) studies. Genetic variants can impact either nearby genes (cis-eQTLs) or distal genes (trans-eQTLs). Colocalization between caQTL and eQTL, or cis- and trans-eQTLs suggests that they share causal variants. However, pairwise colocalization between these molecular QTLs does not guarantee a causal relationship. Mediation analysis can be applied to assess the evidence supporting causality versus independence between molecular QTLs. Given that the function of QTLs can be cell-type-specific, we performed mediation analyses to find epigenetic and distal regulatory causal pathways for genes within two major cell types of the developing human cortex, progenitors and neurons. RESULTS: We find that the expression of 168 and 38 genes is mediated by chromatin accessibility in progenitors and neurons, respectively. We also find that the expression of 11 and 12 downstream genes is mediated by upstream genes in progenitors and neurons. Moreover, we discover that a genetic locus associated with inter-individual differences in brain structure shows evidence for mediation of SLC26A7 through chromatin accessibility, identifying molecular mechanisms of a common variant association to a brain trait. CONCLUSIONS: In this study, we identify cell-type-specific causal gene regulatory networks whereby the impacts of variants on gene expression were mediated by chromatin accessibility or distal gene expression. Identification of these causal paths will enable identifying and prioritizing actionable regulatory targets perturbing these key processes during neurodevelopment.


Assuntos
Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Humanos , Locos de Características Quantitativas , Cromatina , Fenótipo , Polimorfismo de Nucleotídeo Único
15.
Bioinformatics ; 39(5)2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37042725

RESUMO

MOTIVATION: Enrichment analysis is a widely utilized technique in genomic analysis that aims to determine if there is a statistically significant association between two sets of genomic features. To conduct this type of hypothesis testing, an appropriate null model is typically required. However, the null distribution that is commonly used can be overly simplistic and may result in inaccurate conclusions. RESULTS: bootRanges provides fast functions for generation of block bootstrapped genomic ranges representing the null hypothesis in enrichment analysis. As part of a modular workflow, bootRanges offers greater flexibility for computing various test statistics leveraging other Bioconductor packages. We show that shuffling or permutation schemes may result in overly narrow test statistic null distributions and over-estimation of statistical significance, while creating new range sets with a block bootstrap preserves local genomic correlation structure and generates more reliable null distributions. It can also be used in more complex analyses, such as accessing correlations between cis-regulatory elements (CREs) and genes across cell types or providing optimized thresholds, e.g. log fold change (logFC) from differential analysis. AVAILABILITY AND IMPLEMENTATION: bootRanges is freely available in the R/Bioconductor package nullranges hosted at https://bioconductor.org/packages/nullranges.


Assuntos
Genoma , Genômica , Genômica/métodos , Software
16.
Bioinformatics ; 39(5)2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37084270

RESUMO

MOTIVATION: Deriving biological insights from genomic data commonly requires comparing attributes of selected genomic loci to a null set of loci. The selection of this null set is non-trivial, as it requires careful consideration of potential covariates, a problem that is exacerbated by the non-uniform distribution of genomic features including genes, enhancers, and transcription factor binding sites. Propensity score-based covariate matching methods allow the selection of null sets from a pool of possible items while controlling for multiple covariates; however, existing packages do not operate on genomic data classes and can be slow for large data sets making them difficult to integrate into genomic workflows. RESULTS: To address this, we developed matchRanges, a propensity score-based covariate matching method for the efficient and convenient generation of matched null ranges from a set of background ranges within the Bioconductor framework. AVAILABILITY AND IMPLEMENTATION: Package: https://bioconductor.org/packages/nullranges, Code: https://github.com/nullranges, Documentation: https://nullranges.github.io/nullranges.


Assuntos
Genômica , Software , Genômica/métodos , Genoma , Sequências Reguladoras de Ácido Nucleico , Projetos de Pesquisa
17.
Bioinformatics ; 39(4)2023 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-37067481

RESUMO

SUMMARY: Exclusion regions are sections of reference genomes with abnormal pileups of short sequencing reads. Removing reads overlapping them improves biological signal, and these benefits are most pronounced in differential analysis settings. Several labs created exclusion region sets, available primarily through ENCODE and Github. However, the variety of exclusion sets creates uncertainty which sets to use. Furthermore, gap regions (e.g. centromeres, telomeres, short arms) create additional considerations in generating exclusion sets. We generated exclusion sets for the latest human T2T-CHM13 and mouse GRCm39 genomes and systematically assembled and annotated these and other sets in the excluderanges R/Bioconductor data package, also accessible via the BEDbase.org API. The package provides unified access to 82 GenomicRanges objects covering six organisms, multiple genome assemblies, and types of exclusion regions. For human hg38 genome assembly, we recommend hg38.Kundaje.GRCh38_unified_blacklist as the most well-curated and annotated, and sets generated by the Blacklist tool for other organisms. AVAILABILITY AND IMPLEMENTATION: https://bioconductor.org/packages/excluderanges/. Package website: https://dozmorovlab.github.io/excluderanges/.


Assuntos
Genoma Humano , Software , Animais , Humanos , Camundongos , Incerteza
18.
medRxiv ; 2023 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-36945630

RESUMO

Genomic regulatory elements active in the developing human brain are notably enriched in genetic risk for neuropsychiatric disorders, including autism spectrum disorder (ASD), schizophrenia, and bipolar disorder. However, prioritizing the specific risk genes and candidate molecular mechanisms underlying these genetic enrichments has been hindered by the lack of a single unified large-scale gene regulatory atlas of human brain development. Here, we uniformly process and systematically characterize gene, isoform, and splicing quantitative trait loci (xQTLs) in 672 fetal brain samples from unique subjects across multiple ancestral populations. We identify 15,752 genes harboring a significant xQTL and map 3,739 eQTLs to a specific cellular context. We observe a striking drop in gene expression and splicing heritability as the human brain develops. Isoform-level regulation, particularly in the second trimester, mediates the greatest proportion of heritability across multiple psychiatric GWAS, compared with eQTLs. Via colocalization and TWAS, we prioritize biological mechanisms for ~60% of GWAS loci across five neuropsychiatric disorders, nearly two-fold that observed in the adult brain. Finally, we build a comprehensive set of developmentally regulated gene and isoform co-expression networks capturing unique genetic enrichments across disorders. Together, this work provides a comprehensive view of genetic regulation across human brain development as well as the stage-and cell type-informed mechanistic underpinnings of neuropsychiatric disorders.

19.
Cancer Res Commun ; 3(1): 12-20, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36968228

RESUMO

Markers of genomic instability, including TP53 status and homologous recombination deficiency (HRD), are candidate biomarkers of immunogenicity and immune-mediated survival, but little is known about the distribution of these markers in large, population-based cohorts of racially diverse patients with breast cancer. In prior clinical trials, DNA-based approaches have been emphasized, but recent data suggest that RNA-based assessment can capture pathway differences conveniently and may be streamlined with other RNA-based genomic risk scores. Thus, we used RNA expression to study genomic instability (HRD and TP53 pathways) in context of the breast cancer immune microenvironment in three datasets (total n = 4,892), including 1,942 samples from the Carolina Breast Cancer Study, a population-based study that oversampled Black (n = 1,026) and younger women (n = 1,032). Across all studies, 36.9% of estrogen receptor (ER)-positive and 92.6% of ER-negative breast cancer had presence of at least one genomic instability signature. TP53 and HRD status were significantly associated with immune expression in both ER-positive and ER-negative breast cancer. RNA-based genomic instability signatures were associated with higher PD-L1, CD8 T-cell marker, and global and multimarker immune cell expression. Among tumors with genomic instability signatures, adaptive immune response was associated with improved recurrence-free survival regardless of ER status, highlighting genomic instability as a candidate marker for predicting immunotherapy response. Leveraging a convenient, integrated RNA-based approach, this analysis shows that genomic instability interacts with immune response, an important target in breast cancer overall and in Black women who experience higher frequency of TP53 and HR deficiency. Significance: Despite promising advances in breast cancer immunotherapy, predictive biomarkers that are valid across diverse populations and breast cancer subtypes are needed. Genomic instability signatures can be coordinated with other RNA-based scores to define immunogenic breast cancers and may have value in stratifying immunotherapy trial participants.


Assuntos
Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/genética , RNA , Biomarcadores Tumorais/genética , Recidiva Local de Neoplasia/genética , Instabilidade Genômica/genética , Microambiente Tumoral
20.
bioRxiv ; 2023 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-36798154

RESUMO

Somatic mutational signatures elucidate molecular vulnerabilities to therapy and therefore detecting signatures and classifying tumors with respect to signatures has clinical value. However, identifying the etiology of the mutational signatures remains a statistical challenge, with both small sample sizes and high variability in classification algorithms posing barriers. As a result, few signatures have been strongly linked to particular risk factors. Here we present Diffsig, a model and R package for estimating the association of risk factors with mutational signatures, suggesting etiologies for the pre-defined mutational signatures. Diffsig is a Bayesian Dirichlet-multinomial hierarchical model that allows testing of any type of risk factor while taking into account the uncertainty associated with samples with a low number of observations. In simulation, we found that our method can accurately estimate risk factor-mutational signal associations. We applied Diffsig to breast cancer data to assess relationships between five established breast-relevant mutational signatures and etiologic variables, confirming known mechanisms of cancer development. Diffsig is implemented as an R package available at: https://github.com/jennprk/diffsig.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...